Hyper-systolic algorithms for N-body computations and parallel level-3 BLAS libraries
نویسنده
چکیده
Hyper-systolic algorithms repesent a new class of parallel computing structures. Because of their regular communication and compute patterns they are well suited for implementation on most parallel architectures, in particular, high performance SIMD machines can beneet considerably. After a short explanation of the concept of hyper-systolic algorithms, their application to N-body computations and distributed matrix multiplication is discussed. Results from real implementations are presented.
منابع مشابه
Technical Paper Accepted for Publication in Siam Review Software Libraries for Linear Algebra Computations on High Performance Computers 1 Software Libraries for Linear Algebra Computations on High Performance Computers
This paper discusses the design of linear algebra libraries for high performance computers. Particular emphasis is placed on the development of scalable algorithms for MIMD distributed memory concurrent computers. A brief description of the EISPACK, LINPACK, and LAPACK libraries is given, followed by an outline of ScaLAPACK, which is a distributed memory version of LAPACK currently under develo...
متن کاملSoftware Libraries for Linear Algebra Computations on High Performance Computers 1 Software Libraries for Linear Algebra Computations on High Performance Computers
This paper discusses the design of linear algebra libraries for high performance computers. Particular emphasis is placed on the development of scalable algorithms for MIMD distributed memory concurrent computers. A brief description of the EISPACK, LINPACK, and LAPACK libraries is given, followed by an outline of ScaLAPACK, which is a distributed memory version of LAPACK currently under develo...
متن کاملHyper - Systolic Implementation of BLAS - 3 Routines on the APE 100 / Quadrics
Basic Linear Algebra Subroutines (BLAS-3) 1] are building blocks to solve a lot of numerical problems (Cholesky factorization, Gram-Schmidt ortonormalization, LU decomposition,...). Their eecient implementation on a given parallel machine is a key issue for the maximal exploitation of the system's computational power. In this work we refer to a massively parallel processing SIMD machine (the AP...
متن کاملEfficiency of Reproducible Level 1 BLAS
Numerical reproducibility failures appear in massively parallel floating-point computations. One way to guarantee the numerical reproducibility is to extend the IEEE-754 correct rounding to larger computing sequences, as for instance for the BLAS libraries. Is the overcost for numerical reproducibility acceptable in practice? We present solutions and experiments for the level 1 BLAS and we conc...
متن کاملImplementing Blas Level 3 on the Cap–ii
The Basic Linear Algebra Subprogram (BLAS) library is widely used in many supercomputing applications, and is used to implement more extensive linear algebra subroutine libraries, such as LINPACK and LAPACK. The use of BLAS aids in the clarity, portability and maintenance of mathematical software. BLAS level 1 routines involve vector-vector operations, level 2 routines involve matrix-vector ope...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Parallel Computing
دوره 25 شماره
صفحات -
تاریخ انتشار 1999